<!--begin.rcode results='hide', echo=FALSE, message=FALSE
library(caret)
data(BloodBrain)
theme1 <- caretTheme()
theme1$superpose.symbol$col = c(rgb(1, 0, 0, .4), rgb(0, 0, 1, .4), 
  rgb(0.3984375, 0.7578125,0.6445312, .6))
theme1$superpose.symbol$pch = c(15, 16, 17)
theme1$superpose.cex = .8
theme1$superpose.line$col = c(rgb(1, 0, 0, .9), rgb(0, 0, 1, .9), rgb(0.3984375, 0.7578125,0.6445312, .6))
theme1$superpose.line$lwd <- 2
theme1$superpose.line$lty = 1:3
theme1$plot.symbol$col = c(rgb(.2, .2, .2, .4))
theme1$plot.symbol$pch = 16
theme1$plot.cex = .8
theme1$plot.line$col = c(rgb(1, 0, 0, .7))
theme1$plot.line$lwd <- 2
theme1$plot.line$lty = 1

object <- 1

    hook_inline = knit_hooks$get('inline')
    knit_hooks$set(inline = function(x) {
      if (is.character(x)) highr::hi_html(x) else hook_inline(x)
      })
    opts_chunk$set(comment=NA, digits = 3)

library(pROC)

session <- paste(format(Sys.time(), "%a %b %d %Y"),
                 "using caret version",
                 packageDescription("caret")$Version,
                 "and",
                 R.Version()$version.string)
    end.rcode-->

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
  <!--
  Design by Free CSS Templates
http://www.freecsstemplates.org
Released for free under a Creative Commons Attribution 2.5 License

Name       : Emerald 
Description: A two-column, fixed-width design with dark color scheme.
Version    : 1.0
Released   : 20120902

-->
  <html xmlns="http://www.w3.org/1999/xhtml">
  <head>
  <meta name="keywords" content="" />
  <meta name="description" content="" />
  <meta http-equiv="content-type" content="text/html; charset=utf-8" />
  <title>Variable Importance</title>
  <link href='http://fonts.googleapis.com/css?family=Abel' rel='stylesheet' type='text/css'>
  <link href="style.css" rel="stylesheet" type="text/css" media="screen" />
  </head>
  <body>
  <div id="wrapper">
  <div id="header-wrapper" class="container">
  <div id="header" class="container">
  <div id="logo">
  <h1><a href="#">Variable Importance</a></h1>
</div>
  <!--
  <div id="menu">
  <ul>
  <li class="current_page_item"><a href="#">Homepage</a></li>
<li><a href="#">Blog</a></li>
<li><a href="#">Photos</a></li>
<li><a href="#">About</a></li>
<li><a href="#">Contact</a></li>
</ul>
  </div>
  -->
  </div>
  <div><img src="images/img03.png" width="1000" height="40" alt="" /></div>
  </div>
  <!-- end #header -->
<div id="page">
  <div id="content">
  <p>
Variable importance evaluation functions can be separated into two groups: those that use the model information and those that do not. The advantage of using a model-based approach is that is more closely tied to the model performance and that it <i>may</i> be able to incorporate the correlation structure between the predictors into the importance calculation. Regardless of how the importance is calculated:
<ul>
<li> For most classification models, each predictor will have a separate variable importance for each class (the exceptions are classification trees, bagged trees and boosted trees).</li>
<li> All measures of importance are scaled to have a maximum value of 100, unless the <span class="mx arg">scale</span> argument of <span class="mx funCall">varImp.train</span> is set to  <tt><!--rinline 'FALSE' --></tt>.</li>
</ul>
</p>

<h1>Model Specific Metrics</h1>
<p>
The following methods for estimating the contribution of each
variable to the model are available:
<p>
<ul>
<li>
<b>Linear Models:</b> the absolute value of the <i>t</i>-statistic
   for each model parameter is used.
</li>  
<li>
<b> Random Forest:</b> from the R package: "For
     each tree, the prediction accuracy on the out-of-bag portion of
     the data is recorded.  Then the same is done after permuting each
     predictor variable.  The difference between the two accuracies are
     then averaged over all trees, and normalized by the standard
     error.  For regression, the MSE is computed on the out-of-bag data
     for each tree, and then the same computed after permuting a
     variable.  The differences are averaged and normalized by the
     standard error.  If the standard error is equal to 0 for a
     variable, the division is not done."
</li>  
<li>
<b>Partial Least Squares</b>: the variable importance measure here is based on 
   weighted sums of the absolute regression coefficients. The weights are a function of
   the reduction of the sums of squares across the number of PLS components and are 
   computed separately for each outcome. Therefore, the contribution of the coefficients
  are weighted proportionally to the reduction in the sums of squares.      
</li>  
<li>
<b>Recursive Partitioning</b>: The reduction in the loss function
  (e.g. mean squared error) attributed to each variable at each split is 
  tabulated and the sum is returned. Also, since there may be candidate variables
  that are important but are not used in a split, the top competing variables are
  also tabulated at each split. This can be turned off using the <span class="mx arg">maxcompete</span>
  argument in <span class="mx funCall">rpart.control</span>. This method does not currently provide
  class-specific measures of importance when the response is a factor.
</li>  
<li>
<b>Bagged Trees</b>: The same methodology as a single tree is applied to 
  all bootstrapped trees and the total importance is returned
</li>  
<li>
<b>Boosted Trees</b>: This method uses the same approach as a single
  tree, but sums the importances over each boosting iteration (see the <a href="http://cran.r-project.org/web/packages/gbm/index.html"><strong>gbm</strong></a> package vignette).
</li>  
<li>
<b>Multivariate Adaptive Regression Splines</b>: MARS models 
        include a backwards elimination feature selection routine that
        looks at reductions in the generalized cross-validation (GCV)
        estimate of error. The <span class="mx funCall">varImp</span> function tracks the changes in
        model statistics, such as the GCV, for each predictor and
        accumulates the reduction in the statistic when each
        predictor's feature is added to the model. This total reduction
        is used as the variable importance measure. If a predictor was
        never used in any MARS basis function, it has an importance
        value of zero. There are three statistics that can be used to
        estimate variable importance in MARS models. Using
        <tt><!--rinline 'varImp(object, value = "gcv")' --></tt> tracks the reduction in the
        generalized cross-validation statistic as terms are added.
        However, there are some cases when terms are retained 
        in the model that result in an increase in GCV. Negative variable 
        importance values for MARS are set to zero. Terms with non-zero importance that were not included
        in the final, pruned model are also listed as zero. 
        Alternatively, using
        <tt><!--rinline 'varImp(object, value = "rss")' --></tt> monitors the change in the
        residual sums of squares (RSS) as terms are added, which will
        never be negative. 
        Also, the option <tt><!--rinline 'varImp(object, value = "nsubsets")' --></tt> 
        returns the number of times that each variable is involved in a subset (in the final, pruned model).  
        Prior to June 2008, <span class="mx funCall">varImp</span> used an internal function to estimate
        importance for MARS models. Currently, it is a wrapper around the 
        <span class="mx funCall">evimp</span> function in the <a href="http://cran.r-project.org/web/packages/earth/index.html"><strong>earth</strong></a> package.  
</li>  
<li>
<b>Nearest shrunken centroids</b>: The difference between the class
        centroids and the overall centroid is used to measure the
        variable influence (see <span class="mx funCall">pamr.predict</span>). The larger the
        difference between   the class centroid and the overall center
        of the data, the larger the separation between the classes. The
        training set predictions must be supplied when an object of
        class <span class="mx funCall">pamrtrained</span> is given to <span class="mx funCall">varImp</span>. 
</li>  
<li>
<b>Cubist</b>: The Cubist output contains variable usage statistics. It gives the percentage of times where each variable was used in a condition and/or a linear model. Note that this output will probably be inconsistent with the rules shown in the output from <span class="mx funCall">summary.cubist</span>. At each split of the tree, Cubist saves a linear model (after feature selection) that is allowed to have terms for each variable used in the current split or any split above it. <a href="http://sci2s.ugr.es/keel/pdf/algorithm/congreso/1992-Quinlan-AI.pdf">Quinlan (1992)</a> discusses a smoothing algorithm where each model prediction is a linear combination of the parent and child model along the tree. As such, the final prediction is a function of all the linear models from the initial node to the terminal node. The percentages shown in the Cubist output reflects all the models involved in prediction (as opposed to the terminal models shown in the output). The variable importance used here is a linear combination of the usage in the rule conditions and the model. 
</li>
</ul>
</p>

<h1>Model Independent Metrics</h1>

<p>
If there is no model-specific way to estimate importance (or the argument <span class="mx arg">useModel</span><tt> = <!--rinline 'FALSE' --></tt> is used in <span class="mx funCall">varImp</span>) the importance of each predictor is evaluated individually using a "filter" approach. 
</p>
<p>
For classification, ROC curve analysis is conducted on each predictor. For two class problems, a series of cutoffs is applied to the predictor data to predict the class. The sensitivity and specificity are computed for each cutoff and the ROC curve is computed. The trapezoidal rule is used to compute the area under the ROC curve. This area is used as the measure of variable importance. For multi-class outcomes, the problem is decomposed into all pair-wise problems and the area under the curve is calculated for each class pair (i.e. class 1 vs. class 2, class 2 vs. class 3 etc.). For a specific class, the maximum area under the curve across the relevant pair-wise AUC's is used as the variable importance measure.
</p>
<p>
For regression, the relationship between each predictor and the outcome is evaluated. An argument, <span class="mx arg">nonpara</span>, is used to pick the model fitting technique. When <span class="mx arg">nonpara</span><tt> = <!--rinline 'FALSE' --></tt>, a linear model is fit and the absolute value of the <i>t</i>-value for the slope of the predictor is used. Otherwise, a loess smoother is fit between the outcome and the predictor. The R<sup>2</sup> statistic is calculated for this model against the intercept only null model. This number is returned as a relative measure of variable importance.
</p>

<h2>An Example</h2>
<p>
On the model training web, several models were fit to the example data. The boosted tree model has a built-in variable importance score but neither the support vector machine or the regularized discriminant analysis model do. 
</p>

<!--begin.rcode varImp_gbm
gbmImp <- varImp(gbmFit3, scale = FALSE)
gbmImp
    end.rcode-->

<p> The function automatically scales the importance scores to be between 0 and 100. Using <span class="mx arg">scale</span><tt> = <!--rinline 'FALSE' --></tt> avoids this normalization step. </p>

<p> To get the area under the ROC curve for each predictor, the <span class="mx funCall">filterVarImp</span> function can be used. The area under the ROC curve is computed for each class. </p>

<!--begin.rcode varImp_roc1
RocImp <- filterVarImp(x = training[, -ncol(training)], y = training$Class)
head(RocImp)
    end.rcode-->


<p>Alternatively, for models where no built-in importance score is implemented (or exists), the <span class="mx funCall">varImp</span> can still be used to get scores. For SVM classification models, the default behavior is to compute the area under the ROC curve. </p>

<!--begin.rcode varImp_roc2
RocImp2 <- varImp(svmFit, scale = FALSE)
RocImp2
    end.rcode-->

<p>For importance scores generated from <span class="mx funCall">varImp.train</span>, a plot method can be used to visualize the results. In the plot below, the <span class="mx arg">top</span> option is used to make the image more readable.</p>

<!--begin.rcode varImp_gbm_plot,fig.width=4,fig.height=7,tidy=FALSE
plot(gbmImp, top = 20)
    end.rcode-->


<div style="clear: both;">&nbsp;</div>
  </div>
  <!-- end #content -->
<div id="sidebar">
<ul>
  <li>
    <h2>General Topics</h2>
    <ul>
      <li><a href="index.html">Front Page</a></li>
      <li><a href="visualizations.html">Visualizations</a></li>
      <li><a href="preprocess.html">Pre-Processing</a><li>
      <li><a href="splitting.html">Data Splitting</a></li>
      <li><a href="varimp.html">Variable Importance</a></li>
      <li><a href="other.html">Model Performance</a></li>
      <li><a href="parallel.html">Parallel Processing</a></li>
    </ul>
    <h2>Model Training and Tuning</h2>
    <ul>
      <li><a href="training.html">Basic Syntax</a></li>
      <li><a href="modelList.html">Sortable Model List</a></li>
      <li><a href="bytag.html">Models By Tag</a></li>
      <li><a href="similarity.html">Models By Similarity</a></li>
      <li><a href="custom_models.html">Using Custom Models</a></li>
      <li><a href="sampling.html">Sampling for Class Imbalances</a></li> 
      <li><a href="random.html">Random Search</a></li> 
      <li><a href="adaptive.html">Adaptive Resampling</a></li> 
    </ul>
    <h2>Feature Selection</h2>
    <ul>
      <li><a href="featureselection.html">Overview</a>
      <li><a href="rfe.html">RFE</a></li>
      <li><a href="filters.html">Filters</a></li>
      <li><a href="GA.html">GA</a></li>
      <li><a href="SA.html">SA</a></li>
    </ul>  
  </li>
</ul>
</div>
<!-- end #sidebar -->
<div style="clear: both;">&nbsp;</div>
  </div>
  <div class="container"><img src="images/img03.png" width="1000" height="40" alt="" /></div>
  <!-- end #page -->
</div>
<!--begin.rcode echo = FALSE
knit_hooks$set(inline = hook_inline)    
    end.rcode--> 
  <div id="footer-content"></div>
  <div id="footer">
  <p>Created on <!--rinline I(session) -->.</p>
  </div>
  <!-- end #footer -->
</body>
  </html>
